Exploring Origins of Beans
Now let’s take a look at how all the coffee origin information: Country.of.Origin, Harvest.Year.Begin, Quality.Standards and how that effects good coffee - which will be measured next to Total Cup Points.
The overall quality of the coffee test is Total Cup Points (Total.Cup.Points) and is as follows:
*SCAA Total Cup Points (100 point scale):
+90-100 - Outstanding +85-89.99 - Excellent +80-84.99 - Very Good +<80 - Below Grade
To help me in my analysis I’m going to create another column called Total.Cup.Result: Outstanding, Excellent, Very Good, Below Grade. I think it will add quick clarity and understanding to the visualizations.
## Total.Cup.Points Total.Cup.Result
## 2 89.92 Excellent
## 3 89.75 Excellent
## 4 89.00 Excellent
## 5 88.83 Excellent
## 6 88.83 Excellent
## 7 88.75 Excellent
## 8 88.67 Excellent
## 9 88.42 Excellent
## 10 88.25 Excellent
## 11 88.08 Excellent
## 12 87.92 Excellent
## 13 87.92 Excellent
## 14 87.92 Excellent
## 15 87.83 Excellent
## 16 87.58 Excellent
## 17 87.42 Excellent
## 18 87.33 Excellent
## 19 87.25 Excellent
## 20 87.25 Excellent
## 21 87.25 Excellent
## 22 87.17 Excellent
## 23 87.17 Excellent
## 24 87.08 Excellent
## 25 87.08 Excellent
## 26 86.92 Excellent
## 27 86.92 Excellent
## 28 86.83 Excellent
## 29 86.67 Excellent
## 30 86.58 Excellent
## 31 86.58 Excellent
## 32 86.50 Excellent
## 33 86.42 Excellent
## 34 86.33 Excellent
## 35 86.25 Excellent
## 36 86.25 Excellent
## 37 86.25 Excellent
## 38 86.25 Excellent
## 39 86.25 Excellent
## 40 86.17 Excellent
## 41 86.17 Excellent
## 42 86.17 Excellent
## 43 86.17 Excellent
## 44 86.08 Excellent
## 45 86.08 Excellent
## 46 86.08 Excellent
## 47 86.00 Excellent
## 48 86.00 Excellent
## 49 86.00 Excellent
## 50 86.00 Excellent
## 51 86.00 Excellent
## 52 86.00 Excellent
## 53 85.92 Excellent
## 54 85.92 Excellent
## 55 85.92 Excellent
## 56 85.83 Excellent
## 57 85.83 Excellent
## 58 85.83 Excellent
## 59 85.83 Excellent
## 60 85.75 Excellent
## 61 85.75 Excellent
## 62 85.75 Excellent
## 63 85.58 Excellent
## 64 85.58 Excellent
## 65 85.58 Excellent
## 66 85.50 Excellent
## 67 85.50 Excellent
## 68 85.50 Excellent
## 69 85.50 Excellent
## 70 85.50 Excellent
## 71 85.42 Excellent
## 72 85.42 Excellent
## 73 85.42 Excellent
## 74 85.42 Excellent
## 75 85.42 Excellent
## 76 85.33 Excellent
## 77 85.33 Excellent
## 78 85.33 Excellent
## 79 85.33 Excellent
## 80 85.33 Excellent
## 81 85.33 Excellent
## 82 85.33 Excellent
## 83 85.33 Excellent
## 84 85.25 Excellent
## 85 85.25 Excellent
## 86 85.25 Excellent
## 87 85.17 Excellent
## 88 85.17 Excellent
## 89 85.08 Excellent
## 90 85.08 Excellent
## 91 85.08 Excellent
## 92 85.08 Excellent
## 93 85.08 Excellent
## 94 85.08 Excellent
## 95 85.08 Excellent
## 96 85.08 Excellent
## 97 85.00 Excellent
## 98 85.00 Excellent
## 99 85.00 Excellent
## 100 85.00 Excellent
## 101 85.00 Excellent
## 102 85.00 Excellent
## 103 85.00 Excellent
## 104 85.00 Excellent
## 105 85.00 Excellent
## 106 85.00 Excellent
Correlation between Defects of Beans and Total Cup Points
Result: The correlations are rather low, however, the category two does show a slightly higher correlation with total cup points then category one.
## Category.One.Defects Category.Two.Defects Total.Cup.Points
## Category.One.Defects 1.0000000 0.3422092 -0.1068260
## Category.Two.Defects 0.3422092 1.0000000 -0.2136031
## Total.Cup.Points -0.1068260 -0.2136031 1.0000000
Country of Origin
The Total Cup Points by Country plot below gives a quick visual as to which countries have not only the most coffee samples, but also which have higher total cup points.
There are whole lot of outliers in the total cup points. While recognizing that outliers can distort statistical analysis, in this case I feel the outliers can be very informative about my subject-area of coffee being that coffee tasting results is so subjective. So, instead of removing the outliers, I chose to limit my x-axis and y-axis to help better visualize what’s going on, and I will compare median vs. mean to determine what would be a better measurement of “middle of the road” in future measurements.

Looking at the below data, it’s easier to see that Mexico, Columbia and Guatamala are the top producers.
## # A tibble: 37 x 5
## Country.of.Origin total_cup_mean total_cup__medi… total_cup_max n
## <chr> <dbl> <dbl> <dbl> <int>
## 1 Mexico 80.9 81.6 87.2 236
## 2 Colombia 83.1 83.2 86 183
## 3 Guatemala 81.8 82.5 89.8 181
## 4 Brazil 82.4 82.4 88.8 132
## 5 Taiwan 82.0 82 86.6 75
## 6 United States (Hawaii) 81.8 82.8 87.9 73
## 7 Honduras 79.4 81.7 86.7 53
## 8 Costa Rica 82.8 83.2 87.2 51
## 9 Ethiopia 85.5 85.2 90.6 44
## 10 Tanzania, United Republi… 82.4 82.2 86.5 40
## # … with 27 more rows
The below information shows the countries with the highest median of Total Cup Points.
## # A tibble: 37 x 5
## Country.of.Origin total_cup_mean total_cup__median total_cup_max n
## <chr> <dbl> <dbl> <dbl> <int>
## 1 United States 86.0 87.2 87.9 8
## 2 Papua New Guinea 85.8 85.8 85.8 1
## 3 Ethiopia 85.5 85.2 90.6 44
## 4 Japan 84.7 84.7 84.7 1
## 5 Kenya 84.3 84.6 86.2 25
## 6 Panama 83.7 84.1 85.8 4
## 7 Uganda 84.1 83.9 86.8 26
## 8 Ecuador 83.8 83.8 83.8 1
## 9 Colombia 83.1 83.2 86 183
## 10 Costa Rica 82.8 83.2 87.2 51
## # … with 27 more rows
Harvest Year
We already know that 2012 was the greatest production year for coffee (between 2008 and 2018)
Which years appear to have the best total cup points? We can see from below that the best total cup points is 2014. 2012 definitely produced the most coffee. But most of the coffee production appeared below the mean line. So, it appears 2014 was not only a good year for coffee production wise, but it created some good quality coffee!

Exploring Bean Type and Processing
Now let’s take a look at the bean processing information: Variety, Moisture, Processing.Method
Let’s look how the moisture in the beans effects coffee tastes. With the moisture measure, it should have a moisture content of .08 (8%) to .12 (12%). Less/More will be considered to dry/wet for coffee standards.

When I first ran this, it has a low outlier. So I zoomed in to the Total Cup Points that are from 60 - 80. So I can get a better look. I also added a line measurement to get a good look to see if there was a correlation to increased moisture and good tasting coffee.

From the moisture visualization, it appears there is almost no correlation between moisture and total cup points. I verified that with the correlation test below.
##
## Pearson's product-moment correlation
##
## data: coffeedata$Total.Cup.Points and coffeedata$Moisture
## t = -4.5638, df = 1309, p-value = 5.495e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.17808326 -0.07149403
## sample estimates:
## cor
## -0.1251497
By changing the visualization to a scatter plot and changing the alpha level, I could produce a sort of pseudo-heat map. This gives me better idea of how moisture effects good coffee.

According to the below and above information, the average moisture count is at about .08. And the majority of the data falls between .09 and .12. So it appears that the best coffee exists between .10 and .12. It seems that moisture makes an impact, it can’t be too much, and can’t be too little.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.09000 0.11000 0.08886 0.12000 0.28000
Next, it’s time to look at the variety of coffee. I grouped the data together by Variety and then summarized the mean and median.
## # A tibble: 6 x 4
## Variety total_cup_mean total_cup_median n
## <chr> <dbl> <dbl> <int>
## 1 Arusha 82.2 82.4 5
## 2 Blue Mountain 82.1 82.1 2
## 3 Bourbon 81.9 82.3 226
## 4 Catimor 83.3 83.2 20
## 5 Catuai 81.3 81.9 74
## 6 Caturra 82.4 83.1 256


In terms of discovering Variety, I’m pretty pleased. I can now tell just from these visualizations what are the three top tasting varieties of coffee beans.
- Ethiopian Heirlooms - Ethiopia
- Sumatra Lintrong - Indonesia
- SL34 - Mostly found in Kenya
The visualization supports my finding that Ethiopia produces some of the best tasting coffee (rated number 3). Kenya is just south of Ethiopia.
Processing method is the next item to look at.
## # A tibble: 6 x 5
## Processing.Method total_cup_mean total_cup_median total_cup_max n
## <chr> <dbl> <dbl> <dbl> <int>
## 1 Natural / Dry 82.4 82.8 89 251
## 2 Other 81.3 81.8 84.7 26
## 3 Pulped natural / honey 82.8 82.7 86.6 14
## 4 Semi-washed / Semi-pulped 82.6 82.5 86.1 56
## 5 Washed / Wet 82.0 82.4 90.6 812
## 6 <NA> 82.4 83.1 89.8 152

Inspired by another coffee lover, I also included this data so you can get an idea of what countries use what methods!
##
## Natural / Dry Other Pulped natural / honey
## Brazil 80 1 7
## Burundi 0 0 0
## China 3 0 1
## Colombia 27 0 0
## Costa Rica 0 1 2
## Cote d?Ivoire 0 0 0
## Ecuador 1 0 0
## El Salvador 1 0 0
## Ethiopia 17 0 0
## Guatemala 10 2 0
## Haiti 1 0 0
## Honduras 14 0 0
## India 1 0 0
## Indonesia 2 4 0
## Japan 0 0 1
## Kenya 2 0 0
## Laos 0 0 0
## Malawi 0 0 0
## Mauritius 0 0 0
## Mexico 17 0 0
## Myanmar 2 1 0
## Nicaragua 4 3 0
## Panama 1 1 0
## Papua New Guinea 0 0 0
## Peru 0 0 0
## Philippines 1 0 0
## Rwanda 0 0 0
## Taiwan 13 9 2
## Tanzania, United Republic Of 1 0 0
## Thailand 2 0 1
## Uganda 7 0 0
## United States 1 1 0
## United States (Hawaii) 40 0 0
## United States (Puerto Rico) 0 0 0
## Vietnam 3 3 0
## Zambia 0 0 0
##
## Semi-washed / Semi-pulped Washed / Wet
## Brazil 24 6
## Burundi 0 1
## China 0 12
## Colombia 0 121
## Costa Rica 1 45
## Cote d?Ivoire 0 1
## Ecuador 0 0
## El Salvador 1 15
## Ethiopia 0 8
## Guatemala 0 161
## Haiti 0 4
## Honduras 0 35
## India 0 0
## Indonesia 5 6
## Japan 0 0
## Kenya 0 22
## Laos 0 3
## Malawi 0 11
## Mauritius 0 0
## Mexico 14 198
## Myanmar 0 5
## Nicaragua 0 11
## Panama 0 2
## Papua New Guinea 0 1
## Peru 0 8
## Philippines 0 4
## Rwanda 0 1
## Taiwan 9 37
## Tanzania, United Republic Of 1 37
## Thailand 0 18
## Uganda 1 18
## United States 0 6
## United States (Hawaii) 0 9
## United States (Puerto Rico) 0 4
## Vietnam 0 1
## Zambia 0 1
On average, the Processing Method that produces the highest total cup points is the “Pulped natural/honey” method.
In case you are wondering:
The pulped natural / honey process begin the drying process directly after de-pulping rather than undergoing fermentation to remove the mucilage. “Pulped natural” tends to have more fruit and fermented flavors because the bean has more time to interact with the natural sugars from the cherry as enzymes break down the mucilage around the bean. If producers however aren’t careful about stirring and watching, funky flavors will emerge in the roasted coffee.
However, Washed / Wet coffee’s are known for their vibrant notes. Removing all of the cherry prior to drying allows the intrinsic flavors of the bean to shine without anything holding them back. Fruit notes are still found in washed coffees, however, fermented notes and berry notes are less common.
Natural / Dry method involves drying coffee cherries either patios or raised beds in the sun. This process only works in areas that are hot and dry and take to give the coffee a more fruity flavor.
I guess it’s a matter of taste!